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Abstract 


Over the past years, deep learning capabilities and 
the availability of large-scale training datasets ad- 
vanced rapidly, leading to breakthroughs in face 
recognition accuracy. However, these technologies 
are foreseen to face a major challenge in the next 
years due to the legal and ethical concerns about 
using authentic biometric data in AI model train- 
ing and evaluation along with increasingly utilizing 
data-hungry state-of-the-art deep learning models. 
With the recent advances in deep generative mod- 
els and their success in generating realistic and high- 
resolution synthetic image data, privacy-friendly syn- 
thetic data has been recently proposed as an alterna- 
tive to privacy-sensitive authentic data to overcome 
the challenges of using authentic data in face recog- 
nition development. This work aims at providing a 
clear and structured picture of the use-cases taxon- 
omy of synthetic face data in face recognition along 
with the recent emerging advances of face recognition 
models developed on the bases of synthetic data. We 
also discuss the challenges facing the use of synthetic 
data in face recognition development and several fu- 
ture prospects of synthetic data in the domain of face 
recognition. 


1 Introduction 


The breakthroughs of deep neural networks and 
their training optimizations as well as the availabil- 


ity of large-scale identity-labeled face datasets have 
reshaped the research landscape of face recognition 
(FR) over the past years. These emerging technolo- 
gies have dramatically improved FR performances 
leading to the wider integration of FR in a variety 
of applications from logical access control and con- 
sumer low-end devices to automated border control. 
State-of-the-Art (SOTA) FR models [1, 2] utilized 
large-scale face datasets e.g. CASIA-WebFace [3], 
MS-Celeb-1M [4], or VGGFace2 [5] to train deep neu- 
ral networks (DNN) with millions of trainable pa- 
rameters, where the goal is to optimize the empir- 
ical risk minimization function given input training 
samples, their corresponding labels, and DNN train- 
able parameters. Achieving such a goal without being 
over-optimized, i.e. overfitted, requires that training 
datasets are of large scale (massive number of im- 
ages of many identities) and representative of various 
variations that exist in the real world. Large and 
representative data is also required to evaluate FR 
accuracies against different variations that present in 
real operation scenarios e.g. pose, aging, occlusion, 
or lighting. Data is required to evaluate the vulnera- 
bility of FR against different types of attacks such as 
morphing, presentation, master-face, and deep fake 
attacks. FR components, face processing models, 
attack detectors, and face image quality estimation 
models are not different as they require face data for 
training and evaluation. Besides the technical limita- 


tion of collecting large-scale data with realistic vari- 
ations, there are increased concerns about collecting, 
maintaining, redistributing, and using biometric data 
due to legal, ethical, and privacy concerns [6]. Con- 
sequently, many widely used datasets for FR devel- 
opment such as VGGFace2 [5] and MS-Celeb-1M [4] 
have been retracted by their creator. Table 1 sum- 
marizes the most widely used datasets to train FR 
models. Even though many of these datasets have 
been publically released, there are not any more ac- 
cessible. 

Processing biometric data is governed by a set of 
legal restrictions [6]. Taking the General Data Pro- 
tection Regulation (GDPR) [6] as an example, it cate- 
gories biometric data as a special category of personal 
data subjected to rigorous data protection rules [7], 
requiring high protection in connection with funda- 
mental rights and freedoms of individuals. Dealing 
with such data requires adherence to one of the ex- 
emptions of biometric data processing [8], the related 
national laws [9], maintaining processing records [10], 
and the preparation of data protection impact as- 
sessment |11, 12], among other restrictions. Depend- 
ing on the purpose of the biometric data processing, 
this set of restrictions can be rigorously extended 
[13, 14, 15]. Besides the legal complications of us- 
ing and sharing biometric data, ethical requirements 
are commonly necessary, such as the approval of an 
ethics committee or competent authorities. 

The increased concerns about the legal and ethical 
use of authentic data in biometrics along with the 
technical limitation in collecting large and diverse 
face datasets motivate recent works to propose the 
use of synthetic data as an alternative to privacy- 
sensitive authentic data in FR training [16, 17, 18]. 
In an attempt to provide a clear understanding of 
the feasibility of utilizing synthetic face data to train, 
evaluate, attack, or privacy enhancement, this work 
is the first to analyze the properties needed of the 
synthetic data for FR, the use-cases taxonomy of syn- 
thetic data in FR, the current state of synthetic-based 
FR, the limitations and challenges facing the use of 
current synthetic face data in FR, and possible fu- 
ture research directions that might give a larger space 
for synthetic data in different aspects of FR develop- 
ment. 


2 Where is the synthetic data 
used? 


To analyse the properties of the needed synthetic 
data, one should start by building a clear taxonomy 
of the different possible uses-cases of synthetic data in 
its interaction with FR. This taxonomy here will con- 
sider the operations where the synthetic data is used 
to interact with the recognition part of FR systems, 
i.e. the feature extraction. Therefore, synthetic data 
that is meant to interact with other system compo- 
nents, as defined in ISO ISO/IEC 19795-1:2021 [30], 
are out of scope, e.g. synthetic data used to train 
or evaluate face detection or segmentation solutions. 
Additionally, synthesizing faces as a means of domain 
transformation, e.g. from thermal to visible face ap- 
pearance [31] is also out of scope as it just transfers 
the appearance of the image. 

Figure 1 presents the use-case taxonomy of the syn- 
thetic face data interaction with FR. These use-cases 
are categorised under 4 groups, along with the prop- 
erties of the possibly needed data under each category 
(the latter will be discussed in detail in the next sec- 
tion). The four use-case categories are discussed in 
the following. 


1. Training FR: Modern FR solutions are based 
on deep learning models that are either trained 
directly to generate identity-discriminant feature 
representations (e.g. triplet loss [20]) or to clas- 
sify the identity classes in the training data (e.g. 
ArcFace [2], ElasticFace [1], etc.). In the latter 
approach, embeddings proceeding the classifica- 
tion layer of the network are then used to extract 
the identity-discriminant representations. This 
family of approaches is currently predominantly 
leading to SOTA FR performances. In both 
cases, training face data that represents the high 
inter and intra-class diversity of real applications 
is needed to train the models. As mentioned in 
the introduction, the diversity of such data, if 
authentic, is limited by practical data collection 
constraints, and its collection and handling are 
hedged by privacy, legal, and ethical concerns. 
Synthetic data can come in handy to train such 
FR models in different manners based on the 


Table 1: Overview of the most widely used authentic and synthetic facial datasets commonly used to train 
FR models, along with the number of images, identities, images per identity, and the fact that each database 
is public and/or still accessible. Note that many of the public databases are not accessible (raising a practical 
problem for researchers and developers) anymore based on legal and ethical concerns and even those that 
are available are ethically questioned as the individual consent of the data subjects is not always insured. 


Name Year | # Images (m) | # Identities (k) | Avg. | Public | Accessible | Authentic 
CASIA-WebFace [3] | 2014 0.5 10.6 47 Yv x Y 
DeepFace [19] 2014 4.4 4.0 | 1092 x x Y 
FaceNet [20] 2015 200.0 8,000.0 25 x x Vv 
Facebook [21] 2015 500.0 10,000.0 50 x x Vv 
VGGFace [22] 2015 2.6 2.6 992 Vv x Vv 
CelebFaces [23 2016 0.09 5.4 16 Vv Vv Vv 
MS-Celeb-1M [4] 2016 10 100.0 100 Vv x Vv 
MegaFace2 [24 2017 4.7 672.0 7 Vv Vv Vv 
UMDFaces [25 2017 0.4 8.3 46 Vv x Vv 
VGGFace? [5] 2018 3.3 91} 363| v x v 
IMDbFace [26] 2018 1.7 59.0 29 Vv Vv Vv 
MSIMV2 [2, 4 2019 5.8 85.0 68 Vv x Vv 
MillionCelebs [27] 2020 18.8 636.0 30 x x Vv 
WebFace260M [28] 2021 260 4,000.0 65 Vv Vv Vv 
WebFace42M [28] 2021 42 2,000.0 21 Vv Vv Vv 
SynFace [17] 2021 0.5 10| 50/ v v x 
DigiFace-1M-A [29] | 2022 0.72 10 72 Vv Vv x 
DigiFace-IM-B |29] | 2022 0.5 T00 5| v v x 
SFace [16] 2022 0.63 10.6 60 Vv Vv x 
USynthFace [18] | 2022 04 0.4 il v v x 


training requirements. If the model is trained in 
one of the two approaches mentioned above, then 
the synthetic data has to contain a large number 
of identities and multiple samples of each iden- 
tity. If the model is trained on partially authen- 
tic data, however, the intra-class variation of this 
data is low, then the synthetic data needs to con- 
tain multiple samples for each of the authentic 
identities, i.e. act as an augmentation strategy. 
Finally, if the FR model is trained in an unsuper- 
vised manner, then the synthetic training data 
is not largely concerned with the identity group- 
ing, but rather just requires a set of faces of ran- 
dom identities. This data has also been shown 
to be successful in training processes during the 
training-aware quantization of models based on 
full precision parameters [32]. Although it is out 
of the scope of this work, synthetic faces of this 


kind can also be used to train face detectors, 
face segmentation, and attack detection meth- 
ods (e.g. morphing attack detection [33]). 


. Evaluating FR: FR algorithmic evaluation, fol- 


lowing the ISO ISO/IEC 19795-1:2021 [30], re- 
quires the existence of a large set of genuine 
(same identity) and imposter (different identity) 
face image pairs that represent the real opera- 
tional scenario. The need for a large number 
of these pairs is intensified by the ever-more 
accurate performance of FR algorithms. FR 
algorithms can produce two main algorithmic 
errors, genuine pairs classified wrongly as im- 
posters (false non-match (FNM)) or imposter 
pairs classified wrongly as genuine (false match 
FM). As the algorithms produce lower and lower 
rates of decision errors, the FM rates (FMR) and 
FNM rates (FNMR), the number of evaluated 


pairs required to produce statistically significant 
evaluation results become higher. This need for 
large-scale evaluation data is one of the main mo- 
tivations behind requiring synthetic data for the 
evaluation. Another reason is that some author- 
ities that require in-house testing on their own 
data when purchasing FR solutions do only pos- 
sess a single image per identity in their databases 
(think of visa systems) and thus it is impossible 
to have genuine pairs to evaluate FR algorithms. 
Such situations would require synthetic data to 
be generated so it belongs to a certain authentic 
identity, but with realistic variations. In a third 
scenario where the operation scenario would re- 
quire a very low FMR, the need for a huge num- 
ber of imposter pairs is required to evaluate, with 
statistical significance, the FMR. In such cases, 
random synthetic faces with random identities 
can be used to create such imposter pairs. Again, 
although it is out of the scope of this work, these 
synthetic faces, regardless of their identity infor- 
mation, can be used to evaluate face detectors, 
face segmentation, and presentation/morphing 
attack detection. 


. Attacking FR: Commonly, developers would 
use technology to enhance the convenience and 
security of individuals and societies. However, 
technology can also be used maliciously to create 
attacks on individuals, systems, and societies. 
This is the case also with synthetic face data, 
which can also be used as an attack. Synthetic 
data can be created so that a certain face can 
be matched with two or more faces. This can 
target automatic FR comparison or human im- 
age verification, or both. Such attacks can be 
face morphing attacks, where an image is gener- 
ated to match two or more identities, then used 
on an identity or travel document with the al- 
phanumeric data of when the targeted matches. 
Later such a document can be used by the other 
targeted identities illegally, leading to a serious 
security threat. Another attack in the same cat- 
egory is the MasterFace attack, where the syn- 
thetic face is created to match a wider proportion 
of the population, raising many security threats. 


The second type of attack by generated face im- 
ages might focus on generating a face image of a 
specific identity. Such attacks are commonly re- 
ferred to as Deep-Fakes and they are commonly 
used to fool the viewer into wrongly believing 
that a certain person has said or done an action 
in an image or a video. A third attack can use 
synthetic faces that maintain a certain identity 
but excludes a specific pattern with the aim of 
attacking a biometric-based system that ensures 
a legal operation of a process. Such an attack can 
be by presenting the attacker’s real identity, but 
excluding the information that points out that 
the user is underage, in a service that requires 
age verification. 


4. Enhancing the privacy for FR users: Al- 
though excluding certain patterns from gener- 
ated images of specific identities can be seen as 
an attack on biometric systems, in different use- 
cases, they can be seen as a privacy-enhancing 
tool when they are used to avoid the illegal or 
unconsented processing of the data. Such gener- 
ation of the data aims at maintaining a certain 
set of visual patterns but removing the clues of a 
specific pattern. Depending on the use-case, this 
excluded pattern can be related to the identity 
in what is widely known as image-level face de- 
identification, which is defined under the stan- 
dard ISO/TEC 20889:2018 [34]. The excluded 
pattern can be related to certain soft biomet- 
ric attributes like age or gender, which is com- 
monly referred to as soft-biometric privacy en- 
hancement. Although it is out of the scope of 
this work, the generated faces can exclude pat- 
terns that makes them detectable to face detec- 
tion tool, i.e. excluding the information that 
makes the face a face in the view of automatic 
face detection. 


So far, we presented a discussion on the possible 
use-cases of synthetic face data in FR. Each of these 
use-cases has different needs when it comes to syn- 
thetic data. These needs are discussed in the next 
section. 


Use-cases 
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Figure 1: A taxonomy of the synthetic data use-cases (on the top of the figure) directly interacting with 
FR models, either by training them, evaluating them, attacking them, or enhancing the privacy of the 
information extracted by them. This taxonomy lists the existing and foreseen synthetic data types that are 
needed by these use-cases (under each use-case). These data needs are grouped by their main properties by 


color and discussed, along with the use-cases in this paper. 


3 


What data is needed and 
what properties make it 
good? 


The properties of the needed synthetic data under 
the different use-cases (discussed in the previous sec- 
tion) are grouped by their required properties under 
different colors in Figure 1 and are discussed in detail 
in the following: 


1. Single faces of random identities: As de- 


tailed in the previous section, and illustrated in 
Figure 1, synthetic face images of random identi- 
ties without the requirement of multiple images 
to belong to one identity can be used for train- 


ing FR models in an unsupervised manner. They 
additionally can be used to evaluate FR models, 
specifically evaluate the FMR, especially when 
the targeted operational point is at a very low 
FMR, requiring an extremely large number of 
diverse imposter pairs to make the evaluation 
result statistically significant. Here, such data 
should be realistic, i.e. act like authentic data 
when processed by the FR model. A successful 
way to measure that was proposed in [32] and 
it is based on comparing the activation function 
value ranges in the FR model when processing 
authentic data versus when processing the syn- 
thetic data. Additionally, the distribution of the 
comparison scores between pairs of these single 


images of random identities should theoretically 
be similar to those of imposter comparisons of 
authentic data to ensure the similarity to the 
authentic inter-identity variation, which was ex- 
plored in [18]. 


. Multiple faces per random identity: This 
kind of data represents what one would typi- 
cally expect from FR training or evaluation data. 
That is, multiple identities, with multiple im- 
ages per identity. This, given a sufficient inter 
and intra-class (identity variation), can be used 
to train an FR model in a supervised manner. 
This also would contain both imposter and gen- 
uine pairs to evaluate the performance of FR 
by calculating both possible errors, FMR and 
FNMR. Such data should also interact with the 
FR model similarly to authentic data, this as 
mentioned earlier can be measured by monitor- 
ing the value range of the model’s activation 
functions. Here, the data should possess an in- 
ter and intra-class variability of the targeted au- 
thentic data scenario. We specify “targeted” 
here as different evaluation and training goals 
of FR might occur, e.g. a model is evaluated 
specifically for cases with an extreme pose or 
extreme age differences between the comparison 
pairs (intra-class variations), or for cases of pairs 
of twins or siblings (inter-class variation). This 
goes for training as well, as an FR model can be 
trained to specifically be tolerant to mask occlu- 
sions, and thus the training data inter and intra- 
class diversity should represent that. The suit- 
ability of such data can be measured by compar- 
ing its genuine and imposter comparison scores 
distributions with that of the targeted authentic 
data (which can be much smaller in size) as per- 
formed in [16]. For specifically targeted attribute 
variations, such as age and pose, attribute pre- 
dictors can be used to ensure the existence of 
such attribute variations in the synthetic data 
to the same degree as the authentic data. 


. Multiple faces of an existing identity: Au- 
thentic face data with insufficient intra-class 
variation is problematic for the training and eval- 
uation of FR. In terms of training an FR model, 


such data will lead to models that are not trained 
to tolerate intra-class variation (e.g. pose, ex- 
pressions, age, illumination, etc.) and thus are 
expected to lead to high FNMR in practical op- 
erations. When evaluating FR, evaluation data 
in some practical cases such as an authority that 
possesses only a single (or few) images per iden- 
tity (e.g. visa applicant database) would not be 
sufficient to evaluate the expected FNMR as no 
(or few) genuine pairs exist in the data. Both 
cases require acquiring more samples of each of 
the existing identities. These samples have to 
be of realistic variation that matches the tar- 
geted scenario. Such samples might be created 
synthetically and would act as an augmentation 
approach when training an FR model, or as addi- 
tional samples to create genuine pairs when eval- 
uating FR models (or training FR in a triplet 
loss-like strategy). Such synthetic data should 
interact with the FR model similarly to authen- 
tic data, as previously discussed. It should also 
result in genuine comparison score distribution 
that matches the targeted authentic data sce- 
nario. One must take notice that this should be 
the case when the pairs are between the existing 
authentic sample is compared to the synthetic 
images of the same identity, but also, if needed, 
between the synthetically generated samples of 
the same identity themselves. 


. A face of multiple identities: A synthetic 


face can also be used as an attack, the fact that 
a face can be generated synthetically with prop- 
erties that enables an attack on identity systems 
pursues researchers to foresee such attacks. A 
face can be synthesized in a way that it matches 
two more specific (known) identities to create 
what is referred to as a morphing attack. A mor- 
phing attack image is designed to match with 
a number of specific identities and can be cre- 
ated on the image level by interpolating the im- 
ages of the targeted identities, or generated syn- 
thetically to possess the identity information of 
the targets [35]. Such an image, if used in as- 
sociation with a passport or an identity docu- 
ment can enable multiple persons to be verified 


to the alphanumeric information on the card. A 
wider attack that surfaced lately in the litera- 
ture is the MasteFace attack, where the attack 
image is synthesized to match a wide range of 
the population without the need to know the 
targeted identities [36]. As these attacks might 
be used to attack visual inspection, automatic 
verification, or both, they first have to have a 
natural appearance. This natural appearance 
is best measured by user studies, where indi- 
viduals are asked if an image appears realis- 
tic or not. The vulnerability of automatic FR 
to such attacks, and thus the measure of how 
good is the synthetic data for its purpose, can 
be measured using the Mated Morph Presenta- 
tion Match Rate (MMPMR) [37]. The MMPMR 
refers to the fraction of morphs whose similarity 
to both identities used to morph, are below the 
selected FR comparison score threshold relative 
to all morphs. 


. A face of specific authentic identity: Syn- 
thesizing a face of a specific authentic identity 
is usually related to the need to synthesize this 
face with also a specific expression or domain, 
unlike generating such faces of an authentic iden- 
tity where a realistic variation is needed. This is 
commonly related to what is referred to as Deep- 
Fake faces but also includes other face manipula- 
tion techniques such as expression and attribute 
manipulations. As such attacks aim at manipu- 
lating human viewers, their success is best mea- 
sured by how realistic they are to these view- 
ers and how well they succeeded in the targeted 
manipulation in the view of the viewers through 
user studies related to the exact goal of the ma- 
nipulation. However, more within the scope of 
this work is the ability of these attacks to fool au- 
tomatic FR and attack detection algorithms. A 
comprehensive survey on the issue of DeepFakes 
and facial image manipulation is presented by 
Tolosana et al. in [38]. 


. A face that excludes a specific pattern: 
A face synthesizing process can maintain a sub- 
set of patterns from a specific face and excludes 
other subsets of these patterns. Such patterns 


can be identity information, age, gender, ethnic- 
ity, or even the patterns that make a face de- 
tectable as a face, among other attribute pat- 
terns. Such a process can be seen as an attack 
if it is aimed at avoiding a consented required 
process, such as automatic age verification to re- 
ceive a service or make an online purchase. How- 
ever, such a process can also be seen as a privacy 
enhancement mechanism. Excluding the iden- 
tity, while maintaining the image appearance 
and other attributes to some degree is commonly 
referred to as image-level face de-identification 
and it aims at avoiding the unconsented iden- 
tification of face images, whether in the public 
or private space. A subset of this is to exclude 
the patterns of the face that makes it detectable 
and thus avoid further processing. Removing 
other patterns like gender or age falls within the 
image-level soft-biometric privacy enhancement 
techniques that aim at maintaining the identifi- 
cation possibilities without allowing unconsented 
estimation of soft-biometric attributes. Evaluat- 
ing the ability to synthesize these face images is 
based on evaluating the degree to which the pat- 
terns that need to be excluded and the ones that 
need to be maintained are detectable, where the 
first need to be as undetectable as possible and 
the latter needs to be as detectable as possible. 
A comprehensive survey and discussion on these 
technologies are presented by Meden et al. in 
[39]. 


4 Where are we now? 


4.1 Face image generation: 


A deep generative model (DGM) is a deep neu- 
ral network that is trained to interpret and model 
a probability distribution of the authentic training 
data. Specifically, a deep generative model takes 
random points from e.g. Gaussian distribution and 
maps them through a neural network such as the 
generated distribution closely matches the authentic 
data distribution. The main DGM approaches that 


Table 2: Verification accuracies (%) on five different FR benchmarks achieved by the supervised and un- 
supervised FR models trained on the synthetic training databases with the numbers of real and synthetic 
training samples. The result in the first row is reported using the FR model trained on the authentic dataset 
to give an indication of the performance of an FR model trained on the authentic CASIA-WebFace dataset 
[3]. To provide a fair comparison, all model results are obtained from the original published works using 
the same network architecture (ResNet50) trained on relatively same training dataset size. KT refers to 
knowledge transfer from the pretrained FR model. LFW [40], AgeDB-30 [41], CFP-FP [42], CA-LFW [43], 


CP-LFW [44] are widely used FR evaluation benchmarks. 

Method Unsupervised Data augmentation # Synthetic Images | # Authentic Images | KT | LFW | AgeDB-30 | CFP-FP | CA-LFW | CP-LFW 
CosFace [45] x = 0 500K x 99.55 94.55 95.31 93.78 89.95 
SynFace [17] x GAN-based 500K 0 x 91.93 61.63 75.03 74.73 70.43 

DigiFace-1M [29 x = 500K 0 x 88.07 60.92 70.99 69.23 66.73 
DigiFace-1M [29 x Accessory + Geometric and color 500K 0 x 95.40 76.97 87.40 78.62 78.87 

SFace [16] x = 634K 0 x 91.87 71.68 73.86 77.93 73.20 
USynthFace [18 Y GAN-based + Geometric and color 400K 0 x 92.23 71.62 78.56 77.05 72.03 

IDnet [46] x 7 528K 0 x 84.83 63.58 70.43 71.50 67.35 

TDnet [46] x Geometric and color 528K 0 x 92.58 73.53 75.40 79.90 (3) 74.25 
SynFace [17] x GAN-based 500K 40K x 97.23 81.32 87.68 85.08 80.32 

DigiFace-1M [29) x Accessory + Geometric and color 500K 40K x 99.05 89.77 94.01 90.08 87.27 

SFace [16] x 634K 0 Vv 99.13 91.03 91.14 92.47 87.03 


are proposed in the literature are Variational Auto- 
Encoder (VAE) [47], Generative Adversarial Network 
(GAN) [48], Autoregressive model [49], and Normal- 
izing Flows [50] and Diffusion Models (DiffModel) 
[51], in addition to a large number of hybrid mod- 
els that combined two of previous approaches such 
as GAN with VAE [52]. A comprehensive review of 
deep generative modelings is presented by [53]. Each 
of these approaches presented contributions towards 
providing a better trade-off between generated sam- 
ple quality i.e. producing samples of high perceived 
quality and fidelity that resemble the DGM train- 
ing data, inference time i.e. enabling fast sampling 
mechanism, architecture restrictions i.e. some of the 
DGMs are limited to underlying network architecture 
and sample appearance variations. 


4.2 How do the DGM approaches 
match the needed synthetic face 
data properties? 


e Single faces of random identities: DGM ap- 
proaches such as StyleGAN [54] presented very 
promising results in generating single faces of 
random synthetic identities with high visual fi- 
delity. However, the generated faces could share 
the identity information, to a small degree, with 
DGM’s original training (as reported in [55, 16]). 


e Multiple faces per random identities: Ap- 
proaches such as Face-ID-GAN [56], DiscoFace- 
GAN [57], GAN-Control [58], InterFaceGAN 
[59], and CONFIG [52] proposed GAN mod- 
els based on disentangled representation learning 
to conditionally generate face images from syn- 
thetic identities with predefined attributes e.g. 
age, pose, illumination, or expression. As gen- 
erated images are explicitly controlled by a pre- 
defined set of attributes, such images might lake 
the intra-class diversity that exists in real-world 
face data and it is needed to train and evaluate 
FR. 


e Multiple faces of an existing identity: DGM ap- 
proaches such as CONFIG [52] are able to re- 
generate multiple faces of an existing identity 
by reconstructing input faces with a predefined 
set of attributes such as changing expression, 
wearing sunglasses, adding makeup, or changing 
hair color. However, such attribute manipula- 
tion approaches might induce some artifacts in 
reconstructed faces, which might affect identity 
preservation between the input and the recon- 
structed faces. Also, as such approaches are ex- 
plicitly manipulating the attributes of their in- 
put faces, the generated faces might not contain 
large appearance variations, which are needed 
to train and evaluate FR models. More im- 


portantly, identity preservation in reconstructed 
samples is rarely evaluated and reported. 


e A face of multiple identities: DGM approaches 
were not explicitly designed and trained to gen- 
erate a face of multiple identities. However, re- 
cent works such as MorGAN [35], MIPGAN [60], 
and MorDIFF [61], make use of generative mod- 
els to generate a face of multiple identities by 
interpolating two or more latent vectors of syn- 
thetic or real faces and then generating a new 
face of multiple identities. In a similar manner, 
however, with latent vector optimization rather 
than optimization, MasterFaces [36] are gener- 
ated to match unknown identities. 


e A face of specific authentic identity: DGM ap- 
proaches that targeted image-to-image model- 
ing achieved impressive results in generating a 
face of specific authentic identity. This has been 
commonly achieved by manipulating the input 
source face to match specific attributes or a tar- 
get domain while maintaining the identity in- 
formation of the source image. Although such 
approaches did not target generating Deep-Fake 
attacks, they have been widely used in generat- 
ing such kinds of attacks [38]. 


e A face that excludes a specific pattern: None 
of the SOTA DGM approaches explicitly target 
generating a face that excludes a specific pattern. 
A number of works make use of DGM approaches 
to exclude a specific pattern e.g. identity, age, or 
gender of authentic input faces, especially when 
such models include attribute disentanglement. 
However, to the best of our knowledge, none of 
the previous works present solutions to gener- 
ate a face of synthetic identity that excludes a 
specific pattern, rather this is done for faces of 
authentic identities. An overview of the current 
state of this issue can be found in [39]. 


4.3 What is the current state of the 
defined use-cases? 


Very recently a few works build on existing DGM ap- 
proaches to propose FR based on synthetic data. The 


SynFace | 


USynthFace 


DigiFace-1M 


SFace 


IDnet 


Figure 2: Sample of synthetic data used in SynFace 
[17], UsynthFace [18], DigiFace-1M [29] SFace [16] 
and IDnet [46]. It can be clearly noticed the high 
variations in SFace images in comparison to other 
synthetic datasets. Although SynFace and Usynth- 
Face utilized the same DGM (DiscoFaceGAN), it can 
be also observed the appearance variations in USyn- 
thFace using geometric and color transformations. 


following discussion presents the use of synthetic data 
in FR grouped by the use-cases (discussed earlier in 
this paper and presented in Figure 1). 


4.3.1 Training FR 


Recently, synthetically generated face data has been 
proposed as an alternative to privacy-sensitive au- 
thentic data to train FR models mitigating the tech- 
nical, ethical, and legal concerns of using authen- 
tic biometric data in training FR models. The 
currently proposed approaches in the literature uti- 


lized synthetically generated data to train unsuper- 
vised (UsynthFace [18]) or supervised FR models 
(SFace[16], SynFace[17], DigiFace-1M[29] and IDnet 
[46]). Training the unsupervised FR model as in 
UsynthFace requires that the training data maintain 
the property 1 (Section 2) i.e. single face of ran- 
dom identities, while supervised approaches, SFace, 
SynFace, IDnet, and DigiFace-1M, require that the 
training data maintain the property 2 i.e. multi- 
ple faces per random identities (Section 2). Some 
of these approaches, SynFace and DigiFace-1M, pro- 
posed combining authentic with synthetic data dur- 
ing the training or transferring the knowledge from 
the pretrained FR model to improve the recognition 
accuracies. Others (USynthFace) utilized only syn- 
thetic data for FR training. Most synthetic FR ap- 
proaches utilized GAN-based (UsynthFace, SynFace) 
and/or geometric and color transformation data aug- 
mentation (UsynthFace, IDnet, and DigiFace-1M) 
methods to create more challenging training samples 
improving the model recognition accuracies. Table 2 
summarizes the achieved accuracies on five FR bench- 
marks by recent FR models trained on synthetic data. 
It can be observed from the reported results in Ta- 
ble 2 that including data augmentation in FR model 
training significantly improved the recognition accu- 
racies. Also, the unsupervised FR model (Usynth- 
Face [18]) obtained very competitive results using un- 
labeled data to supervised synthetic-based FR mod- 
els. 


4.3.2 Evaluating FR 


A few works proposed the use of synthetic data for 
evaluating FR. SynFace [17] presented a synthetic 
version of the Labeled Faces in the Wild (LFW) 
dataset [40] and evaluated two FR models trained 
on authentic and synthetic data, respectively on the 
synthetic version of the LFW. The model trained on 
real data achieved an accuracy of 98.85% and the 
one trained on synthetic data achieved an accuracy 
of 99.98%. The work [17] also suggested that the 
degradation in the verification performance between 
the two models is due to the domain gap between 
synthetic and real training images. 
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4.3.3 Attacking FR 


DGM approaches have been widely and successfully 
utilized to generate morphing, MasterFace, deep- 
fake, and manipulation attacks on FR. Researchers 
generally attempt to foresee such attacks and evalu- 
ate their potential. Deep-fake and face manipulation 
attacks are already a serious problem facing mod- 
ern societies and their generation is becoming more 
available and realistic with time [38]. Morphing at- 
tacks based on synthesized faces are a serious threat 
and FR recognition vulnerability to them is getting 
close to that of image-level morphing [60]. Master- 
Face attacks are relatively new, their initial proposed 
form is based on optimization on a relatively weak 
FR model [36] with other works arguing their fea- 
sibility [62]. However, on the other hand, synthetic 
data has helped create privacy-friendly databases for 
the detection of such attacks, specifically, the morph- 
ing attack [33, 63] and face presentation attack [64]. 
Huber et al. [63] organized a competition on face 
morphing attack detection (MAD) based on privacy- 
aware synthetic training data [33]. The competition 
aimed at promoting the use of synthetic data to de- 
velop MAD solutions and attracted 12 solutions from 
both academia and industry. 


4.3.4 Privacy enhancement 


Main advances in this respect are presented under one 
of two categories, de-identification or soft-biometric 
privacy. De-identification can be achieved by adding 
adversarial noise to the image, image obfuscation, 
and image synthesis, the latter being the core focus 
of this work. Many solutions have been proposed in 
the literature, with a recent overview of these solu- 
tions presented in [39]. The main challenge so far 
in this domain is the cross-FR model performance as 
most works showed very good performances on the 
FR models that were used to optimize the solution, 
however, this performance drops when using other 
unknown FR models. Syntheses-based soft-biometric 
privacy followed a similar trend as de-identification, 
however, with much less dominance in the literature. 
In this aspect, many works rather focused on soft- 
biometric privacy on the template level rather than 


the image. Image and template level techniques are 
surveyed in [39]. An example of image-based tech- 
niques is the FlowSAN [65] aimed at minimizing gen- 
der information in the resulting images. Here, as 
the target is the soft-biometrics and not the identity, 
the main challenge is to achieve generalized perfor- 
mance across soft-biometric estimators while main- 
taining FR performance across FR models. 


5 Where can we do better? 


Here, based on the discussed use-cases taxonomy, the 
synthetic data requirements, and their current state 
along with the generation process, we discuss the 
main issues where further improvement in future re- 
search can have a strong effect on the use of synthetic 
data in FR. The following discussion will touch on the 
generation process, the defined use-cases, as well as 
the general lack of well-defined suitability evaluation 
protocols. 


5.1 Face image generation 


Generating realistic and high-quality samples along 
with enabling high sampling speed and high- 
resolution scaling have derived the main contribu- 
tions of recent generative models proposed in the 
literature. In addition, some DGM approaches tar- 
geted specific applications such as image in-painting, 
attribute manipulation, face aging, image super- 
resolution, and image-to-image and text-to-image 
translations. Such applications mainly require that 
the generated samples are of high visual fidelity with 
less focus on the identity information, which might 
be less optimal for biometric applications. When de- 
veloping DGM for FR use-cases, the solution should 
focus on the utility of the generated images for the 
given tasks rather than only focusing on the human- 
perceived quality. The emerging works on training 
FR solutions, presented earlier, are considered the 
first step in this regard. This focus on utility, rather 
than only the perceived quality, should be the main 
drive in future research when synthesizing images for 
FR. 
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5.2 Training FR 


Recent works that proposed the use of synthetic 
face data for FR utilized deep neural network ar- 
chitectures with hyper-parameters that are opti- 
mized on authentic data. Such training paradigms 
might be sub-optimal for learning face representa- 
tions from synthetic data. Future research works 
might target proposing network architectures or 
training paradigms designed specifically to learn from 
synthetic data. In general, training FR solutions of 
synthetic data still fails behind those trained on au- 
thentic data in terms of accuracy, which is the main 
practical shortcoming that hinders placing such solu- 
tions in practical use currently. However, one must 
keep in mind that training FR on synthetic data is 
a very recently emerging research direction and it is 
already achieving higher recognition accuracies than 
solutions trained on synthetic data less than a decade 
ago [19]. 


5.3 Evaluating FR 


The need for large-scale FR evaluation datasets that 
represent real scenario variations is the main motiva- 
tion for future research directions on synthetic data 
for FR evaluation. Although DGMs can generate ar- 
bitrary realistic face images, the utility of the gener- 
ated images for FR remains challenging. Future re- 
search works include but are not limited to, DGMs for 
generating multiple faces of existing authentic iden- 
tities, which might target specific variations such as 
age and pose, and generating complete evaluation 
datasets of multiple images of multiple identities. 


5.4 Attacking FR 


Even though creating novel attacks on identity man- 
agement systems and society in general sounds is a 
serious malicious action, it is essential to foresee at- 
tacks created by real attackers to better enable their 
detection. As the attackers would ask, the researchers 
should also ask “What is the strongest attack I can 
create to serve the attack goals given the current state 
of basic technology?” This follows the never-ending 
game of cat and mouse between attacks and attack 


mitigation. Therefore, the constant struggle here is 
to always try to foresee new attacks and attack gener- 
ation methodologies and analyze their strengths and 
weaknesses, leading to better mitigation strategies. 


5.5 Privacy enhancement 


The main challenge to generative face privacy en- 
hancement is the generalizability and robustness as it 
must possess to maintain operation in real-world ap- 
plications. This generalization must ensure that the 
de-identification properties are strongly maintained 
even with unknown FR solutions. The same goes for 
soft-biometric privacy, where the privacy-enhanced 
images should maintain their privacy properties when 
processed by diverse soft-biometric estimators with 
different levels of knowledge [66]. Other open issues 
that still require increasing attention are the lack of 
clear quantifiability and provability privacy enhance- 
ment, the limited public benchmarks, and the need 
for controllable privacy where the user can have a 
choice of the privatised information [39]. 


5.6 Evaluation protocols 


We provided in this work an initial discussion on what 
synthetic data is needed for different FR use-cases 
and what properties are needed from such data based 
on the way it is used. However, this initial discussion 
should evolve into a much-needed set of evaluation 
metrics and protocols that can precisely and com- 
parably answer the question of “How well does the 
created data fit its targeted properties within its use- 
case?” Besides, and based on, the needed academic 
efforts in this regard, given that the synthetic data 
is foreseen to be a commodity, there is a need for 
such protocols and metric standards on the industrial 
level. A clear candidate to develop such a standard 
would be the ISO SC37 work group 5 on Biometric 
testing and reporting. 


6 CONCLUSION 


The use of authentic data in FR poses technical, le- 
gal, and ethical concerns. However, such data plays 
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a major role in training, evaluating, enhancing the 
FR user privacy, and even attacking FR. This work 
provided initial discussions on the use of synthetic 
data in FR as an alternative to authentic data. We 
started by analysing and defining taxonomies for dif- 
ferent possible FR use-cases in which synthetic data 
can be used. Then, we discussed the needed proper- 
ties of synthetic data under each FR use-case. This 
has been followed by presenting the current state of 
synthetic FR. Finally, we provided several interest- 
ing directions of work that can be investigated in the 
future. As a concluding remark, the use of synthetic 
data in different FR uses-cases is still in the early re- 
search stage and this work provides a base discussion 
on this research direction and aims at motivating and 
promoting further research works toward responsible 
FR development. 
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